Comparing Taxonomies for Organising Collections of Documents

نویسندگان

  • Samuel Fernando
  • Mark M. Hall
  • Eneko Agirre
  • Aitor Soroa
  • Paul D. Clough
  • Mark Stevenson
چکیده

There is a demand for taxonomies to organise large collections of documents into categories for browsing and exploration. This paper examines four existing taxonomies that have been manually created, along with two methods for deriving taxonomies automatically from data items. We use these taxonomies to organise items from a large online cultural heritage collection. We then present two human evaluations of the taxonomies. The first measures the cohesion of the taxonomies to determine how well they group together similar items under the same concept node. The second analyses the concept relations in the taxonomies. The results show that the manual taxonomies have high quality well defined relations. However the novel automatic method is found to generate very high cohesion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel self-organising clustering model for time-event documents

Purpose Neural document clustering techniques, e.g., self-organising map (SOM) or growing neural gas (GNG), usually assume that textual information is stationary on the quantity. However, the quantity of text is ever-increasing. We propose a novel dynamic adaptive self-organising hybrid (DASH) model, which adapts to time-event news collections not only to the neural topological structure but al...

متن کامل

Protection of Archival Documents from Photochemical Eects

Purpose: ­The purpose of this paper is to highlight the destructive effects of light on archival documents/paper materials. ­The research aims to explain the mechanism of photochemical degradation and the damaging effect of light on paper. It also tells us about the measures to be adopted to control the deteriorating effects of light on paper step by step. Design/Methodology/Approach: Th­e res...

متن کامل

A Probabilistic Hierarchical Clustering Method for Organising Collections of Text Documents

In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and sy...

متن کامل

Associative Conceptual Space-based Information Retrieval Systems

In this ‘Information Era’ with the availability of large collections of books, articles, journals, CDROMs, video films and so on, there exists an increasing need for intelligent information retrieval systems that enable users to find the information desired easily. Many attempts have been made to construct such retrieval systems, including the electronic ones used in libraries and including the...

متن کامل

Visualisations for Comparing Self-organising Maps

Self-organising Maps (SOMs) are a very useful method for exploring and analysing large data collections: They project high-dimensional data into a low-dimensional output space so that it is easier to analyse for humans than the original data. For the purpose of analysis, plenty of visualisations exist which display different aspects and properties of the maps and the data. There are, however, v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012